load libraries
library(dplyr)
library(ggplot2)
library(tidyr)
library(plotly)This document created to make easier understanding data that we are working.
In this document we will observe some tables:
ds_year ( describes the number of titles and the number of copies for each year, since 2018)
ds_genre ( describes the number of titles and the number of copies for each year, by genre)
ds_georaphy ( describes the number of titles and the number of copies for each year, by place)
ds_language (describes the number of titles and the number of copies for each year, by language)
ds_pubtype ( describes the number of titles and the number of copies for each year, by publication type)
ds_ukr_rus (describes the number of titles and the number of copies, the percentage of Ukrainian and russian books for each year)
P.s: all table have information for 2005 - 2024, expect ds_language
Load libraries that are used in this report
library(dplyr)
library(ggplot2)
library(tidyr)
library(plotly)We will be working with the data that describes the two main measures: the number of titles (title_count) and the number of copies (copy_count). We have specific tables that describe these measures under different conditions.
getwd()
ds_genre <- readRDS("../../data-private/derived/manipulation/ds_genre.rds") # load ds_genre
ds_year <- readRDS("../../data-private/derived/manipulation/ds_year.rds") # load ds_year
ds_language <- readRDS("../../data-private/derived/manipulation/ds_language.rds") # load ds_language
ds_geography <- readRDS("../../data-private/derived/manipulation/ds_geography.rds") # load ds_geography
ds_pubtype <- readRDS("../../data-private/derived/manipulation/ds_pubtype.rds") # load ds_pubtype
ds_ukr_rus <- readRDS("../../data-private/derived/manipulation/ds_ukr_rus.rds") # load ds_ukr_rusWe want to do a short visualization for each table, to make easier understanding the data (you observe few graphs that describe the data)
This table has 40 rows, and 3 columns. Also, in the measure column we have only two possible values - title_count means the number of titles and copy_count means the number of copies. In the column value we have the numerical numbers for both measures.
The first graph observe the number of copies for each year.
years_all <- seq(min(ds_year$yr), max(ds_year$yr))
g_year_copy <- ds_year %>%
filter(measure == "copy_count") %>%
ggplot(aes(x = yr, y = value)) +
geom_point() +
geom_line() +
scale_x_continuous(breaks = years_all) +
scale_y_continuous(breaks = seq(0, 80000, by = 10000), limits = c(0, 80000)) +
labs(
title = "Number of Copies by Year"
, x = "The year"
, y = "Number of copies (ths.)"
)+
theme_minimal() +
theme(
panel.grid.minor = element_blank()
,panel.border = element_rect(color = "black", fill = NA, size = 1)
)
g_year_copy_plotly <- ggplotly(g_year_copy)
g_year_copy_plotlyThe second graph observe the number of titles for each year.
g_year_title <- ds_year %>%
filter(
measure == "title_count"
) %>%
ggplot(aes(x = yr, y = value)) +
geom_point() +
geom_line() +
scale_x_continuous(breaks = years_all) +
scale_y_continuous(breaks = seq(0, 27500, by = 5000), limits = c(7500, 27500)) +
theme_minimal() +
labs(
title = "Number of Titles by Year"
, x = "The year"
, y = "Number of titles"
) +
theme(panel.grid.minor = element_blank()
,panel.border = element_rect(color = "black", fill = NA, size = 1))
g_year_title_plotly <- ggplotly(g_year_title)
g_year_title_plotlyrm(g_year_title, g_year_copy, years_all)What we can observe:
the most productive year for new titles and the year with most copies
the production trend
This table has 40 rows, and 15 columns. Also, in the measure column we have only two possible values - title_count means the number of titles and copy_count means the number of copies. From the third column, each column is a separate genre.
In a first graph, we see the number of copies of each genre, by year.
ds_genre_copy <-
ds_genre %>%
filter(
measure == "copy_count"
) %>%
group_by(yr) %>%
pivot_longer(
cols = -c(yr, measure), # adding genre column
names_to = "genre",
values_to = "value"
)
g_genre_copy<- ds_genre_copy %>%
ggplot(aes(x = genre, y = value, fill = genre)) +
geom_col() +
facet_wrap(~ yr, ncol = 4) +
scale_y_continuous(breaks = seq(0, 30000, by = 7500), limits = c(0, 37000)) +
labs(
title = "Number of Copies by Genre and Year",
x = "Genre",
y = "Number of Copies (ths.)",
fill = "Genre"
) +
theme_get() +
theme(
axis.text.x = element_blank(),
axis.title.x = element_blank()
,legend.position = "bottom"
, panel.spacing.y = unit(5, "lines")
)
g_genre_copy_plotly <- ggplotly(g_genre_copy) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.1,
xanchor = "center"
)
)
g_genre_copy_plotlyIn a second graph, we can see the number of titles for each genre, by year.
ds_genre_title <-
ds_genre %>%
filter(
measure == "title_count"
) %>%
group_by(yr) %>%
pivot_longer(
cols = -c(yr, measure), # adding genre column
names_to = "genre",
values_to = "value"
)
g_genre_title <- ds_genre_title %>%
ggplot(aes(x = genre, y = value, fill = genre)) +
geom_col() +
facet_wrap(~ yr, ncol = 4) +
labs(
title = "Number of Titles by Genre and Year",
x = "Genre",
y = "Number of Titles",
fill = "Genre"
) +
theme_get() +
theme(
axis.text.x = element_blank(),
axis.title.x = element_blank()
,legend.position = "bottom"
, panel.spacing.y = unit(5, "lines"),
)
g_genre_title_plotly <- ggplotly(g_genre_title) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.1,
xanchor = "center"
)
)
g_genre_title_plotlyrm(g_genre_copy, ds_genre_copy, ds_genre_title, g_genre_title)This table has 40 rows, and 28 columns. Also, in the measure column we have only two possible values - title_count means the number of titles and copy_count means the number of copies. From the third column, each column is a separate geographical place.
region_groups <- list(
"Західна Україна" = c("Львівська", "Івано-Франківська", "Закарпатська", "Тернопільська", "Чернівецька", "Волинська", "Рівненська"),
"Центральна Україна" = c("Київська", "Житомирська", "Черкаська", "Кіровоградська", "Полтавська", "Вінницька"),
"Південна Україна" = c("Одеська", "Миколаївська", "Херсонська", "Запорізька"),
"Східна Україна" = c("Харківська", "Донецька", "Луганська", "Дніпропетровська"),
"Північна Україна" = c("Чернігівська", "Сумська")
, "Крим" = "Автономна Республіка Крим"
)
region_group_df <- tibble::tibble(
region_name = unlist(region_groups),
group = rep(names(region_groups), times = sapply(region_groups, length))
)
g_geography <- ds_geography %>%
pivot_longer(
cols = -c(yr, measure),
names_to = "region_name",
values_to = "value"
) %>%
left_join(region_group_df, by = "region_name") %>%
select(yr, measure, region_name, group, everything())This graph describes the number of copies for each region in Central Ukraine
g_geography_central_copies <- g_geography %>%
filter(measure == "copy_count", group == "Центральна Україна") %>%
complete(yr, region_name, fill = list(value = 0)) %>%
ggplot(aes(x = region_name, y = value, fill = region_name)) +
geom_col() +
facet_wrap(~ yr, ncol = 4) +
scale_y_continuous(breaks = seq(0, 750, by = 250)) +
coord_cartesian(ylim = c(0, 750)) +
labs(
title = "Number of Copies by Region (Central Ukraine) and Year",
x = "Region",
y = "Number of Copies (ths.)",
fill = "Region"
) +
theme_minimal() +
theme(
axis.text.x = element_blank(),
axis.title.x = element_blank()
, panel.spacing.y = unit(6, "lines")
, panel.spacing.x = unit(2, "lines")
)
g_geography_central_copies_plotly <- ggplotly(g_geography_central_copies) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.1,
xanchor = "center"
)
)
g_geography_central_copies_plotlyrm(g_geography_central_copies)This graph describes the number of copies for each region in North Ukraine
g_geography_pivnichna_copies <- g_geography %>%
filter(measure == "copy_count", group == "Північна Україна") %>%
complete(yr, region_name, fill = list(value = 0)) %>%
ggplot(aes(x = region_name, y = value, fill = region_name)) +
geom_col() +
facet_wrap(~ yr, ncol = 4) +
scale_y_continuous(breaks = seq(0, 500, by = 250)) +
coord_cartesian(ylim = c(0, 500)) +
labs(
title = "Number of Copies by Region (North Ukraine) and Year",
x = "Region",
y = "Number of Copies (ths.)",
fill = "Region"
) +
theme_minimal() +
theme(
axis.text.x = element_blank(),
axis.title.x = element_blank(),
panel.spacing.y = unit(6, "lines"),
panel.spacing.x = unit(2, "lines")
)
g_geography_pivnichna_copies_plotly <- ggplotly(g_geography_pivnichna_copies) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.1,
xanchor = "center"
)
)
g_geography_pivnichna_copies_plotlyrm(g_geography_pivnichna_copies)This graph describes the number of copies for each region in South Ukraine
g_geography_pivdena_copies <- g_geography %>%
filter(measure == "copy_count", group == "Південна Україна") %>%
complete(yr, region_name, fill = list(value = 0)) %>%
ggplot(aes(x = region_name, y = value, fill = region_name)) +
geom_col() +
facet_wrap(~ yr, ncol = 4) +
scale_y_continuous(breaks = seq(0, 500, by = 250)) +
coord_cartesian(ylim = c(0, 500)) +
labs(
title = "Number of Copies by Region (South Ukraine) and Year",
x = "Region",
y = "Number of Copies (ths.)",
fill = "Region"
) +
theme_minimal() +
theme(
axis.text.x = element_blank(),
axis.title.x = element_blank(),
panel.spacing.y = unit(6, "lines"),
panel.spacing.x = unit(2, "lines")
)
g_geography_pivdena_copies_plotly <- ggplotly(g_geography_pivdena_copies) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.1,
xanchor = "center"
)
)
g_geography_pivdena_copies_plotlyrm(g_geography_pivdena_copies)This graph describes the number of copies for each region in West Ukraine
g_geography_zahidna_copies <- g_geography %>%
filter(measure == "copy_count", group == "Західна Україна") %>%
complete(yr, region_name, fill = list(value = 0)) %>%
ggplot(aes(x = region_name, y = value, fill = region_name)) +
geom_col() +
facet_wrap(~ yr, ncol = 4) +
scale_y_continuous(breaks = seq(0, 1250, by = 500)) +
coord_cartesian(ylim = c(0, 1250)) +
labs(
title = "Number of Copies by Region (West Ukraine) and Year",
x = "Region",
y = "Number of Copies (ths.)",
fill = "Region"
) +
theme_minimal() +
theme(
axis.text.x = element_blank(),
axis.title.x = element_blank(),
panel.spacing.y = unit(6, "lines"),
panel.spacing.x = unit(2, "lines")
)
g_geography_zahidna_copies_plotly <- ggplotly(g_geography_zahidna_copies) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.1,
xanchor = "center"
)
)
g_geography_zahidna_copies_plotlyrm(g_geography_zahidna_copies)This graph describes the number of copies for each region in East Ukraine
g_geography_shidna_copies <- g_geography %>%
filter(measure == "copy_count", group == "Східна Україна") %>%
complete(yr, region_name, fill = list(value = 0)) %>%
ggplot(aes(x = region_name, y = value, fill = region_name)) +
geom_col() +
facet_wrap(~ yr, ncol = 4) +
scale_y_continuous(breaks = seq(0, 3500, by = 1000)) +
coord_cartesian(ylim = c(0, 3500)) +
labs(
title = "Number of Copies by Region (East Ukraine) and Year",
x = "Region",
y = "Number of Copies (ths.)",
fill = "Region"
) +
theme_minimal() +
theme(
axis.text.x = element_blank(),
axis.title.x = element_blank(),
panel.spacing.y = unit(6, "lines"),
panel.spacing.x = unit(2, "lines")
)
g_geography_shidna_copies_plotly <- ggplotly(g_geography_shidna_copies) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.1,
xanchor = "center"
)
)
g_geography_shidna_copies_plotlyrm(g_geography_shidna_copies)This graph describes the number of copies for Crimea
g_geography_krym_copies <-
g_geography %>%
filter(
measure == "copy_count",
group == "Крим"
) %>%
ggplot(aes(x = yr, y = value, group = 1)) + # add group=1 for geom_line
geom_point() +
geom_line() +
labs(
title = "Number of Copies by Region (Krym) and Year",
x = "Year",
y = "Number of Copies (ths.)"
) +
scale_y_continuous(breaks = seq(0, 1500, by = 250), limits = c(0, 1500)) +
scale_x_continuous(breaks = 2005:2024) +
theme_minimal() +
theme(
panel.grid.minor = element_blank(),
legend.position = "none" # No legend needed, since only one line/region
)
g_geography_krym_copies_plotly <- ggplotly(g_geography_krym_copies)
g_geography_krym_copies_plotlyrm(g_geography_krym_copies)This graph describes the number of titles for each region in Central Ukraine
g_geography_central_title <- g_geography %>%
filter(measure == "title_count", group == "Центральна Україна") %>%
complete(yr, region_name, fill = list(value = 0)) %>%
ggplot(aes(x = region_name, y = value, fill = region_name)) +
geom_col() +
facet_wrap(~ yr, ncol = 4) +
scale_y_continuous(breaks = seq(0, 750, by = 250)) +
coord_cartesian(ylim = c(0, 750)) +
labs(
title = "Number of Titles by Region (Central Ukraine) and Year",
x = "Region",
y = "Number of Titles",
fill = "Region"
) +
theme_minimal() +
theme(
axis.text.x = element_blank(),
axis.title.x = element_blank(),
panel.spacing.y = unit(6, "lines"),
panel.spacing.x = unit(2, "lines")
)
g_geography_central_title_plotly <- ggplotly(g_geography_central_title) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.1,
xanchor = "center"
)
)
g_geography_central_title_plotlyrm(g_geography_central_title)This graph describes the number of titles for each region in North Ukraine
g_geography_pivnichna_title <- g_geography %>%
filter(measure == "title_count", group == "Північна Україна") %>%
complete(yr, region_name, fill = list(value = 0)) %>%
ggplot(aes(x = region_name, y = value, fill = region_name)) +
geom_col() +
facet_wrap(~ yr, ncol = 4) +
scale_y_continuous(breaks = seq(0, 750, by = 250)) +
coord_cartesian(ylim = c(0, 750)) +
labs(
title = "Number of Titles by Region (North Ukraine) and Year",
x = "Region",
y = "Number of Titles",
fill = "Region"
) +
theme_minimal() +
theme(
axis.text.x = element_blank(),
axis.title.x = element_blank(),
panel.spacing.y = unit(6, "lines"),
panel.spacing.x = unit(2, "lines")
)
g_geography_pivnichna_title_plotly <- ggplotly(g_geography_pivnichna_title) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.1,
xanchor = "center"
)
)
g_geography_pivnichna_title_plotlyrm(g_geography_pivnichna_title)This graph describes the number of titles for each region in South Ukraine
g_geography_pivdena_title <- g_geography %>%
filter(measure == "title_count", group == "Південна Україна") %>%
complete(yr, region_name, fill = list(value = 0)) %>%
ggplot(aes(x = region_name, y = value, fill = region_name)) +
geom_col() +
facet_wrap(~ yr, ncol = 4) +
scale_y_continuous(breaks = seq(0, 750, by = 250)) +
coord_cartesian(ylim = c(0, 750)) +
labs(
title = "Number of Titles by Region (South Ukraine) and Year",
x = "Region",
y = "Number of Titles",
fill = "Region"
) +
theme_minimal() +
theme(
axis.text.x = element_blank(),
axis.title.x = element_blank(),
panel.spacing.y = unit(6, "lines"),
panel.spacing.x = unit(2, "lines")
)
g_geography_pivdena_title_plotly <- ggplotly(g_geography_pivdena_title) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.1,
xanchor = "center"
)
)
g_geography_pivdena_title_plotlyrm(g_geography_pivdena_title)This graph describes the number of titles for each region in West Ukraine
g_geography_zahidna_title <- g_geography %>%
filter(measure == "title_count", group == "Західна Україна") %>%
complete(yr, region_name, fill = list(value = 0)) %>%
ggplot(aes(x = region_name, y = value, fill = region_name)) +
geom_col() +
facet_wrap(~ yr, ncol = 4) +
scale_y_continuous(breaks = seq(0, 1250, by = 500)) +
coord_cartesian(ylim = c(0, 1250)) +
labs(
title = "Number of Titles by Region (West Ukraine) and Year",
x = "Region",
y = "Number of Titles",
fill = "Region"
) +
theme_minimal() +
theme(
axis.text.x = element_blank(),
axis.title.x = element_blank(),
panel.spacing.y = unit(6, "lines"),
panel.spacing.x = unit(2, "lines")
)
g_geography_zahidna_title_plotly <- ggplotly(g_geography_zahidna_title) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.1,
xanchor = "center"
)
)
g_geography_zahidna_title_plotlyrm(g_geography_zahidna_title)This graph describes the number of titles for each region in East Ukraine
g_geography_shidna_title <- g_geography %>%
filter(measure == "title_count", group == "Східна Україна") %>%
complete(yr, region_name, fill = list(value = 0)) %>%
ggplot(aes(x = region_name, y = value, fill = region_name)) +
geom_col() +
facet_wrap(~ yr, ncol = 4) +
scale_y_continuous(breaks = seq(0, 4000, by = 1000)) +
coord_cartesian(ylim = c(0, 4000)) +
labs(
title = "Number of Titles by Region (East Ukraine) and Year",
x = "Region",
y = "Number of Titles",
fill = "Region"
) +
theme_minimal() +
theme(
axis.text.x = element_blank(),
axis.title.x = element_blank(),
panel.spacing.y = unit(6, "lines"),
panel.spacing.x = unit(2, "lines")
)
g_geography_shidna_title_plotly <- ggplotly(g_geography_shidna_title) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.1,
xanchor = "center"
)
)
g_geography_shidna_title_plotlyrm(g_geography_shidna_title)This graph describes the number of titles for Crimea
g_geography_krym_title <-
g_geography %>%
filter(
measure == "title_count",
group == "Крим"
) %>%
ggplot(aes(x = yr, y = value, group = 1)) + # group=1 for line plot
geom_point() +
geom_line() +
labs(
title = "Number of Titles by Region (Krym) and Year",
x = "Year",
y = "Number of Titles"
) +
scale_y_continuous(breaks = seq(0, 1000, by = 250), limits = c(0, 1000)) +
scale_x_continuous(breaks = 2005:2024) +
theme_minimal() +
theme(
panel.grid.minor = element_blank(),
legend.position = "none" # no legend needed
)
g_geography_krym_title_plotly <- ggplotly(g_geography_krym_title)
g_geography_krym_title_plotlyrm(g_geography_krym_title)This table has 14 rows, and 39 columns. Also, in the measure column we have only two possible values - title_count means the number of titles and copy_count means the number of copies. From the third column, each column is new language. We have information only for 2018 - 2024 years.
Do to inspection of this table, we will observe each year separately. Each graph display only those language, which have at least 10 different titles and 10.000 copies.
g_language <- ds_language %>%
pivot_longer(
cols = -c(yr, measure),
names_to = "language",
values_to = "value"
)g_language_2018 <-
g_language %>%
filter(
yr == 2018,
(measure == "copy_count" & value > 10) | (measure == "title_count" & value > 10)
) %>%
mutate(
measure = recode(measure,
copy_count = "Copies",
title_count = "Titles"),
language = recode(language,
`Кількома мовами народів світу` = "Декілька")
) %>%
ggplot(aes(x = language, y = value, fill = measure)) +
geom_col(position = "dodge") +
scale_y_continuous(breaks = seq(0, 40000, by = 5000), limits = c(0, 40000)) +
labs(
title = "Number of Copies and Titles (2018)",
x = "Language",
y = "Count",
fill = "Measure"
) +
theme_minimal() +
theme(
legend.position = "bottom"
, axis.text.x = element_text(size = 8)
)
g_language_2018_plotly <- ggplotly(g_language_2018) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.15,
xanchor = "center"
)
)
g_language_2018_plotlyrm(g_language_2018)g_language_2019 <-
g_language %>%
filter(
yr == 2019,
(measure == "copy_count" & value > 10) | (measure == "title_count" & value > 10)
) %>%
mutate(
measure = recode(measure,
copy_count = "Copies",
title_count = "Titles"),
# Rename languages as needed
language = recode(language,
`Кількома мовами народів світу` = "Декілька")
) %>%
ggplot(aes(x = language, y = value, fill = measure)) +
geom_col(position = "dodge") +
scale_y_continuous(breaks = seq(0, 55000, by = 10000), limits = c(0, 55000)) +
labs(
title = "Number of Copies and Titles (2019)",
x = "Language",
y = "Count",
fill = "Measure"
) +
theme_minimal() +
theme(
legend.position = "bottom",
axis.text.x = element_text(size = 8)
)
g_language_2019_plotly <- ggplotly(g_language_2019) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.15,
xanchor = "center"
)
)
g_language_2019_plotlyrm(g_language_2019)g_language_2020 <-
g_language %>%
filter(
yr == 2020,
(measure == "copy_count" & value > 10) | (measure == "title_count" & value > 10)
) %>%
mutate(
measure = recode(measure,
copy_count = "Copies",
title_count = "Titles"),
# Rename language as needed:
language = recode(language,
`Кількома мовами народів світу` = "Декілька")
) %>%
ggplot(aes(x = language, y = value, fill = measure)) +
geom_col(position = "dodge") +
scale_y_continuous(breaks = seq(0, 40000, by = 5000), limits = c(0, 40000)) +
labs(
title = "Number of Copies and Titles (2020)",
x = "Language",
y = "Count",
fill = "Measure"
) +
theme_minimal() +
theme(
legend.position = "bottom",
axis.text.x = element_text(size = 8)
)
g_language_2020_plotly <- ggplotly(g_language_2020) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.15,
xanchor = "center"
)
)
g_language_2020_plotlyrm(g_language_2020)g_language_2021 <-
g_language %>%
filter(
yr == 2021,
(measure == "copy_count" & value > 10) | (measure == "title_count" & value > 10)
) %>%
mutate(
measure = recode(measure,
copy_count = "Copies",
title_count = "Titles"),
# Rename language as needed:
language = recode(language,
`Кількома мовами народів світу` = "Декілька")
) %>%
ggplot(aes(x = language, y = value, fill = measure)) +
geom_col(position = "dodge") +
scale_y_continuous(breaks = seq(0, 40000, by = 5000), limits = c(0, 40000)) +
labs(
title = "Number of Copies and Titles (2021)",
x = "Language",
y = "Count",
fill = "Measure"
) +
theme_minimal() +
theme(
legend.position = "bottom",
axis.text.x = element_text(size = 8)
)
g_language_2021_plotly <- ggplotly(g_language_2021) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.15,
xanchor = "center"
)
)
g_language_2021_plotlyrm(g_language_2021)g_language_2022 <-
g_language %>%
filter(
yr == 2022,
(measure == "copy_count" & value > 10) | (measure == "title_count" & value > 10)
) %>%
mutate(
measure = recode(measure,
copy_count = "Copies",
title_count = "Titles"),
# Optional: rename any language here
language = recode(language,
`Кількома мовами народів світу` = "Декілька")
) %>%
ggplot(aes(x = language, y = value, fill = measure)) +
geom_col(position = "dodge") +
scale_y_continuous(breaks = seq(0, 15000, by = 5000), limits = c(0, 15000)) +
labs(
title = "Number of Copies and Titles (2022)",
x = "Language",
y = "Count",
fill = "Measure"
) +
theme_minimal() +
theme(
legend.position = "bottom",
axis.text.x = element_text(size = 8)
)
g_language_2022_plotly <- ggplotly(g_language_2022) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.15,
xanchor = "center"
)
)
g_language_2022_plotlyrm(g_language_2022)g_language_2023 <-
g_language %>%
filter(
yr == 2023,
(measure == "copy_count" & value > 10) | (measure == "title_count" & value > 10)
) %>%
mutate(
measure = recode(measure,
copy_count = "Copies",
title_count = "Titles"),
# Optional: rename any language as needed
language = recode(language,
`Кількома мовами народів світу` = "Декілька")
) %>%
ggplot(aes(x = language, y = value, fill = measure)) +
geom_col(position = "dodge") +
scale_y_continuous(breaks = seq(0, 25000, by = 5000), limits = c(0, 25000)) +
labs(
title = "Number of Copies and Titles (2023)",
x = "Language",
y = "Count",
fill = "Measure"
) +
theme_minimal() +
theme(
legend.position = "bottom",
axis.text.x = element_text(size = 8)
)
g_language_2023_plotly <- ggplotly(g_language_2023) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.15,
xanchor = "center"
)
)
g_language_2023_plotlyrm(g_language_2023)g_language_2024 <-
g_language %>%
filter(
yr == 2024,
(measure == "copy_count" & value > 10) | (measure == "title_count" & value > 10)
) %>%
mutate(
measure = recode(measure,
copy_count = "Copies",
title_count = "Titles"),
# Optional: rename any language as needed
language = recode(language,
`Кількома мовами народів світу` = "Декілька")
) %>%
ggplot(aes(x = language, y = value, fill = measure)) +
geom_col(position = "dodge") +
scale_y_continuous(breaks = seq(0, 35000, by = 5000), limits = c(0, 35000)) +
labs(
title = "Number of Copies and Titles (2024)",
x = "Language",
y = "Count",
fill = "Measure"
) +
theme_minimal() +
theme(
legend.position = "bottom",
axis.text.x = element_text(size = 8)
)
g_language_2024_plotly <- ggplotly(g_language_2024) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.15,
xanchor = "center"
)
)
g_language_2024_plotlyrm(g_language_2024)This table has 40 rows, and 16 columns. Also, in the measure column we have only two possible values - title_count means the number of titles and copy_count means the number of copies. From the third column, each column is new publication type.
First graph describes the number of copies of each publication type, by year.
ds_pubtype_l <- ds_pubtype %>%
pivot_longer(
cols = -c(yr, measure),
names_to = "pubtype",
values_to = "value"
)
ds_pubtype_copy <-
ds_pubtype_l %>%
filter(measure == "copy_count") %>%
group_by(yr)
g_pubtype_copy <- ds_pubtype_copy %>%
ggplot(aes(x = pubtype, y = value, fill = pubtype)) +
geom_col() +
facet_wrap(~ yr, ncol = 4) +
scale_y_continuous(breaks = seq(0, 30000, by = 7500), limits = c(0, 37000)) +
coord_cartesian(ylim = c(0, 37000)) +
labs(
title = "Number of Copies by Publication Type and Year",
x = "Publication Type",
y = "Number of Copies (ths.)",
fill = "Publication Type"
) +
theme_get() +
theme(
axis.text.x = element_blank(),
axis.title.x = element_blank()
,legend.position = "bottom"
, panel.spacing.y = unit(5, "lines")
)
g_pubtype_copy_plotly <- ggplotly(g_pubtype_copy) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.1,
xanchor = "center"
)
)
g_pubtype_copy_plotlyrm(ds_pubtype_l, ds_pubtype_copy, g_pubtype_copy)Second graph describes the number of titles of each publication type, by year.
ds_pubtype_l <- ds_pubtype %>%
pivot_longer(
cols = -c(yr, measure),
names_to = "pubtype",
values_to = "value"
)
ds_pubtype_title <-
ds_pubtype_l %>%
filter(measure == "title_count") %>%
group_by(yr)
g_pubtype_title <- ds_pubtype_title %>%
ggplot(aes(x = pubtype, y = value, fill = pubtype)) +
geom_col() +
facet_wrap(~ yr, ncol = 4) +
scale_y_continuous(breaks = seq(0, 9000, by = 2500), limits = c(0, 9000)) +
coord_cartesian(ylim = c(0, 9000)) +
labs(
title = "Number of Titles by Publication Type and Year",
x = "Publication Type",
y = "Number of Titles",
fill = "Publication Type"
) +
theme_get() +
theme(
axis.text.x = element_blank(),
axis.title.x = element_blank(),
legend.position = "bottom",
panel.spacing.y = unit(5, "lines")
)
g_pubtype_title_plotly <- ggplotly(g_pubtype_title) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.1,
xanchor = "center"
)
)
g_pubtype_title_plotlyrm(ds_pubtype_title, g_pubtype_title)This table has 40 rows, and 16 columns. Also, in the measure column we have only two possible values - title_count means the number of titles and copy_count means the number of copies. Third column describes the percentage Ukrainian versions to ukr and rus versions. Froths, describes the percentage of russian versions to ukr and rus versions.
First graph describer the number of copies of Ukrainian and russian copies for each year since 2005 to 2024.
g_ukrrus_copies <-
ds_ukr_rus %>%
filter(measure == "copy_count") %>%
select(yr, ukr, rus) %>%
pivot_longer(
cols = c(ukr, rus),
names_to = "language",
values_to = "value"
) %>%
ggplot(aes(x = factor(yr), y = value, fill = language)) +
geom_col(position = "dodge") +
scale_x_discrete(expand = expansion(mult = c(0))) +
labs(
title = "Number of Copies: Ukrainian vs Russian by Year",
x = "Year",
y = "Number of Copies (ths.)",
fill = "Language"
) +
theme_minimal() +
theme(
legend.position = "bottom"
)
g_ukrrus_copies_plotly <- ggplotly(g_ukrrus_copies) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.15,
xanchor = "center"
)
)
g_ukrrus_copies_plotlyrm(g_ukrrus_copies)Second graph describer the number of titles of Ukrainian and russian titles for each year since 2005 to 2024.
g_ukrrus_title <-
ds_ukr_rus %>%
filter(measure == "title_count") %>%
select(yr, ukr, rus) %>%
pivot_longer(
cols = c(ukr, rus),
names_to = "language",
values_to = "value"
) %>%
ggplot(aes(x = factor(yr), y = value, fill = language)) +
geom_col(position = "dodge") +
scale_x_discrete(expand = expansion(mult = c(0))) +
scale_y_continuous(breaks = seq(0, 20000, by = 5000), limits = c(0, 20000)) +
labs(
title = "Number of Titles: Ukrainian vs Russian by Year",
x = "Year",
y = "Number of Titles",
fill = "Language"
) +
theme_minimal() +
theme(
legend.position = "bottom"
)
g_ukrrus_title_plotly <- ggplotly(g_ukrrus_title) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.15,
xanchor = "center"
)
)
g_ukrrus_title_plotlyrm(g_ukrrus_title)This document created to help understand the date more deeply. I hope it was useful for you. 😉